26 Jan 2024

Fundamental Principles of Official Statistics

  • Clear mention of the process used to produce statistics
  • To retain trust in official statistics, the statistical agencies need to decide according to strictly professional considerations, including scientific principles and professional ethics, on the methods and procedures for the collection, processing, storage and presentation of statistical data.

Usual practice: Theory vs reality



Usual practice: In the end

What are the issues?

  • Lots of files
  • Cut and paste is not a reliable, reproducible approach!
  • Each operator has his/her own approach
  • Several versions of code may coexist
  • Mistakes hard to track
  • The steps aren’t recorded
  • Testing is hard
  • Reproducibility is not granted
  • Quality is controlled only at the end

What is a Reproducible Analytical Pipeline (RAP)?

Source: The Turing Way



  • It is a process
  • It is easily repeatable
  • It is easily extendable
  • It is automated
  • It minimises mistakes
  • It is fast
  • It builds trust

What does a RAP look like?

It is a simple process:


  • linking inputs (data)
  • to outputs (publication)

What does a RAP look like?

This process can be decomposed:


  • Succession of tasks
  • Direct linkage of actions
  • Different software can be used

What does a RAP look like?

This process can be decomposed:


  • Each task is coded
  • \(\hookrightarrow\) No manual actions
  • Each task uses inputs
  • Each task produces outputs
  • \(\hookrightarrow\) Easy to test tasks individually
  • \(\hookrightarrow\) Each output is identified

What does a RAP look like?

This process is documented:


  • Each code has versions
  • Versions are annotated
  • \(\hookrightarrow\) Easy to follow tasks development
  • \(\hookrightarrow\) Easy to track mistakes

What does a RAP look like?

This process is saved:


  • Each code is securely saved
  • Each version can be revereted
  • \(\hookrightarrow\) Easy to undo/revert to past version
  • \(\hookrightarrow\) Easy to test

What are the benefits?

Source: The Turing Way

Analysis within an RAP are:

  • Easy to use
  • Easy to find information
  • Easy for others to use
  • Easy to revise and adapt
  • Easy to reuse
  • Automated and fast
  • Open and promoting trust

What do we need?

  • A good knowledge of the process

  • A good organisation:

    • of files
    • of code
    • of documentation
  • An open source software

  • A versioning system

  • Time to learn

Why open-source instead of proprietary?

Source: The Turing Way


Open source tools:

  • Have a huge supportive online community
  • Are reviewed. Issues fixed swiftly
  • Are transparent in their content
  • Can inter operate with other software
  • Are free for anyone to use and share

What is version control?

Tracking the three Ws:

  • Who made this?
  • Which change(s)?
  • Why?

Why use version control?


  • One place to store your code
  • You and collaborators are free to write and develop locally
  • Complete documented history of all changes made
  • Easy to share
  • Your future self will thank you!

The 4 Rs!

Source: The Turing Way

An analysis can be:

  • Reproducible
  • Replicable
  • Robust
  • Reusable

What do we mean by reproducible?

A project is reproducible if it returns the same results when redone with the same data and the same analysis (same code).

Source: The Turing Way


What are the benefits?

  • Helps build trust
  • Not reliant on single individual
  • Can be adapted and re-used

Building a RAP process can be difficult

Before we start, here are a few things to consider:

  • IT infrastructure available
  • Data privacy - where and how am I storing my data?
  • Expertise - what training do I need?
  • Legacy systems - what are the barriers to transitioning?

But it is worth it!

Source: The Turing Way

And we don’t have to do it all at once nor alone



The building blocks of a RAP:

  • Reproducible code
  • Using open-source tools
  • Version control
  • … all useful, each will improve a specific dimension of the process.

RAP in practice

  • Implemented in some NSOs (Vanuatu) Source: The Turing Way
  • Can be done easily with R/Rstudio
  • Can also be done with Python/Jupyter notebooks, Quarto (both R, Python, Julia, others…)
  • Large community to help

Let’s Start

Useful resources

  • The UK government RAP website.

  • UK best practice documentation.

  • A free RAP course to teach you all you need to know.

  • How the Data Science Campus sets its coding standards.

  • A new open-source book from the Alan Turing institute setting out how to do reproducible data science.

Citing The Turing Way

Many of the beautiful images used in this presentation were taken from The Turing Way book.

Full citation:

The Turing Way Community, Becky Arnold, Louise Bowler, Sarah Gibson, Patricia Herterich, Rosie Higman, … Kirstie Whitaker. (2019, March 25). The Turing Way: A Handbook for Reproducible Data Science (Version v0.0.4). Zenodo. http://doi.org/10.5281/zenodo.3233986